First-order Methods Almost Always Avoid Saddle Points

نویسندگان

Jason D. Lee

Ioannis Panageas

Georgios Piliouras

Max Simchowitz

Michael I. Jordan

Benjamin Recht

چکیده

We establish that first-order methods avoid saddle points for almost all initializations. Our results apply to a wide variety of first-order methods, including gradient descent, block coordinate descent, mirror descent and variants thereof. The connecting thread is that such algorithms can be studied from a dynamical systems perspective in which appropriate instantiations of the Stable Manifold Theorem allow for a global stability analysis. Thus, neither access to secondorder derivative information nor randomness beyond initialization is necessary to provably avoid saddle points.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How to Escape Saddle Points Efficiently

This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost “dimension-free”). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors. When all saddle points are...

متن کامل

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not...

متن کامل

A Generic Approach for Escaping Saddle points

A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from saddles one has to use second-order methods. However, most works on second-order methods rely extensively on expensive Hessian-based computations, making them ...

متن کامل

The Theory of Discrete Lagrange Multipliers for Nonlinear Discrete Optimization

In this paper we present a Lagrange-multiplier formulation of discrete constrained optimization problems, the associated discrete-space first-order necessary and sufficient conditions for saddle points, and an efficient first-order search procedure that looks for saddle points in discrete space. Our new theory provides a strong mathematical foundation for solving general nonlinear discrete opti...

متن کامل

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1710.07406 شماره

صفحات -

تاریخ انتشار 2017

First-order Methods Almost Always Avoid Saddle Points

نویسندگان

چکیده

منابع مشابه

How to Escape Saddle Points Efficiently

Gradient Descent Can Take Exponential Time to Escape Saddle Points

A Generic Approach for Escaping Saddle points

The Theory of Discrete Lagrange Multipliers for Nonlinear Discrete Optimization

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

عنوان ژورنال:

اشتراک گذاری